Skip to main content

Web Collection 2012

Web crawl data from the year 2012. Some of this data is currently not publicly accessible.

192,502
RESULTS
rss


PART OF
Web Collections
Web Crawls
Media Type
192,502
web
Topics & Subjects
152,946
crawldata
3,503
wiki
3,495
dumps
3,494
incremental
3,494
media
3,494
tape
More right-solid
Collection
192,081
Web Crawls
134,403
Internet Archive Web Crawls
95,684
Worldwide Web Crawls
39,252
Wide Crawl started April 2012
32,297
Youtube Videos
30,362
Wide Crawl started January 2012
More right-solid
Creator
140,978
internet archive
21,490
archive-it
11,966
thumper2.php
3,513
wikimedia projects editors
273
portuguese web archive
17
www.engadget.com
More right-solid
Language
3,037
English
275
Portuguese
26
Italian
19
Anyhub
13
Swedish
9
German
More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Wide Crawl started January 2012
web
eye 4.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
vkontakte.ru
web
eye 2M
favorite 0
comment 0
Source: vkontakte.ru
Wide Crawl started January 2012
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 948,090
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-20T21:59:15 UTC to 2012-07-21T06:44:52 UTC.
Topic: crawldata
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=12, accountId=593, organizationName="Government Printing Office", collectionId=3142, collectionName="CFPB"
Youtube Videos
web
eye 836,439
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl440.us.archive.org:youtube from Sat Jul 21 05:39:04 PDT 2012 to Fri Jul 20 23:46:43 PDT 2012.
Topic: crawldata
Japan Earthquake
web
eye 829,337
favorite 0
comment 0
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=507, accountId=156, organizationName="Virginia Tech: Crisis, Tragedy, and Recovery Network", collectionId=2438, collectionName="Japan Earthquake"
recurrence=WEEKLY, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=3, accountId=575, organizationName="Chicago-Kent College of Law", collectionId=2817, collectionName="Chicago-Kent College of Law"
Live Web Proxy Crawls
web
eye 799,437
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 12:37:06 PST 2012 to Tue Jan 3 06:31:55 PST 2012.
Topic: crawldata
The Archive Team Just In Time Grabs
web
eye 778,849
favorite 0
comment 0
Wikipedia Outlinks February 2012
web
eye 598,428
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Mar 1 20:56:37 PST 2012 to Thu Mar 1 14:19:48 PST 2012.
Topic: crawldata
Election Crawl 2012
web
eye 574,949
favorite 0
comment 0
Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Fri Jun 29 21:24:09 PDT 2012 to Thu Jan 10 20:37:50 PST 2013.
Topic: crawldata
Live Web Proxy Crawls
web
eye 538,170
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sat May 12 21:38:20 PDT 2012 to Sat May 12 18:43:53 PDT 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 527,763
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl337.us.archive.org:wide from Wed Oct 17 08:14:47 PDT 2012 to Wed Oct 17 02:41:59 PDT 2012.
Topic: crawldata
Alexa Crawls
by thumper2.php
web
eye 478,136
favorite 0
comment 0
Alexa crawl
Topic: crawldata
Wikipedia Outlinks February 2012
web
eye 452,700
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Oct 18 19:09:44 PDT 2012 to Thu Oct 18 13:30:19 PDT 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 434,562
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Wed Jan 11 15:00:58 PST 2012 to Wed Jan 11 07:47:03 PST 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 433,782
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Fri Oct 19 01:17:49 PDT 2012 to Thu Oct 18 21:38:24 PDT 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 429,398
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 18:18:33 PST 2012 to Thu Jan 5 11:30:47 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 416,737
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 20:28:12 PST 2012 to Thu Jan 5 13:25:29 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 405,558
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Wed Mar 7 23:01:46 PST 2012 to Wed Mar 7 17:53:26 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 393,868
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 11:14:32 PST 2012 to Tue Jan 3 04:36:49 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 391,323
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 14:35:17 PST 2012 to Tue Jan 3 08:02:37 PST 2012.
Topic: crawldata
Archive Team
web
eye 384,822
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
Wide Crawl started January 2012
web
eye 381,144
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 16:52:35 PST 2012 to Thu Jan 5 10:17:00 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 381,135
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-11-22T06:44:42 UTC to 2012-11-22T18:14:08 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 365,591
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Tue Apr 17 00:58:46 PDT 2012 to Mon Apr 16 20:37:31 PDT 2012.
Topic: crawldata
Archive Team
web
eye 355,514
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
Wide Crawl started September 2012
web
eye 354,620
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl410.us.archive.org:wide from Wed Nov 7 02:28:32 PST 2012 to Tue Nov 6 20:50:40 PST 2012.
Topic: crawldata
NLS_2012
web
eye 348,342
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sat Jun 16 16:29:25 PDT 2012 to Wed Dec 12 11:44:11 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 346,896
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-06-26T02:05:01 UTC to 2012-06-26T10:40:25 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 345,086
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-13T10:53:21 UTC to 2012-12-13T16:53:32 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 344,691
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 15:53:45 PST 2012 to Thu Jan 5 08:51:54 PST 2012.
Topic: crawldata
google.co.jp
web
eye 342,469
favorite 0
comment 0
Source: google.co.jp
Live Web Proxy Crawls
web
eye 341,427
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-31T05:25:29 UTC to 2013-01-01T02:09:59 UTC.
Topic: crawldata
NLS_2012
web
eye 336,977
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Wed Oct 31 04:28:48 PDT 2012 to Wed Dec 12 01:48:33 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 332,078
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T21:29:12 UTC to 2013-01-23T21:42:32 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 329,702
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-30T15:07:01 UTC to 2012-12-31T14:15:26 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 310,358
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Mon Mar 5 18:11:24 PST 2012 to Mon Mar 5 12:34:16 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 306,298
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-18T16:45:10 UTC to 2012-12-19T09:20:10 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 303,307
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-14T17:34:12 UTC to 2012-12-15T10:27:19 UTC.
Topic: crawldata
Wide Crawl started April 2012
web
eye 302,303
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Jun 8 07:28:07 PDT 2012 to Fri Jun 8 02:31:40 PDT 2012.
Topic: crawldata
Archive Team
web
eye 298,886
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
The Archive Team Just In Time Grabs
web
eye 295,391
favorite 1
comment 0
Live Web Proxy Crawls
web
eye 291,586
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-11-22T04:31:46 UTC to 2012-11-22T19:27:11 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 289,852
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 19:32:22 PST 2012 to Thu Jan 5 12:20:54 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 288,646
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl336.us.archive.org:wide from Wed Jan 18 17:01:06 PST 2012 to Wed Jan 18 09:17:47 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 285,747
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Feb 28 04:16:59 PST 2012 to Tue Feb 28 03:44:00 PST 2012.
Topic: crawldata
Wide Crawl started April 2012
web
eye 285,342
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun May 13 07:09:39 PDT 2012 to Sun May 13 01:49:20 PDT 2012.
Topic: crawldata
Archive Team
web
eye 284,634
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
Wikipedia Outlinks February 2012
web
eye 282,463
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Sat Feb 11 14:58:31 PST 2012 to Sat Feb 11 08:33:22 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 281,888
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-30T19:10:53 UTC to 2012-12-31T10:26:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 280,913
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Jan 1 09:53:25 PST 2012 to Sun Jan 1 03:42:17 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 280,316
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-30T04:16:42 UTC to 2012-07-30T11:28:26 UTC.
Topic: crawldata
ask.com
web
eye 280,077
favorite 0
comment 0
Source: ask.com
Live Web Proxy Crawls
web
eye 279,357
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Feb 12 06:50:14 PST 2012 to Sun Feb 12 02:22:07 PST 2012.
Topic: crawldata
reddit.com
web
eye 278,429
favorite 0
comment 0
Source: reddit.com
reddit.com
web
eye 276,371
favorite 0
comment 0
Source: reddit.com
Live Web Proxy Crawls
web
eye 275,517
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T00:05:42 UTC to 2012-12-27T16:35:17 UTC.
Topic: crawldata
Wide Crawl started April 2012
web
eye 275,100
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu May 3 12:36:06 PDT 2012 to Thu May 3 07:30:25 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 271,468
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-10-30T19:58:33 UTC to 2012-10-31T18:47:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 270,608
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-28T19:46:00 UTC to 2012-12-29T15:05:14 UTC.
Topic: crawldata
Archive Team
web
eye 269,588
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
ameblo.jp
web
eye 269,335
favorite 0
comment 0
Source: ameblo.jp
Live Web Proxy Crawls
web
eye 268,506
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-10-15T10:26:27 UTC to 2012-10-15T18:41:53 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 268,487
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-07T06:06:22 UTC to 2012-12-10T21:54:23 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 268,301
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-27T10:56:04 UTC to 2012-07-27T17:15:08 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 265,149
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-20T18:25:28 UTC to 2012-12-21T16:41:39 UTC.
Topic: crawldata
NLS_2012
web
eye 263,707
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sun Jul 8 09:00:15 PDT 2012 to Wed Dec 12 01:48:28 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 261,637
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T22:44:21 UTC to 2012-12-15T05:16:24 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 261,575
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sat Mar 24 09:36:36 PDT 2012 to Sat Mar 24 08:42:19 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 259,447
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T07:49:06 UTC to 2012-12-28T07:29:11 UTC.
Topic: crawldata
Wide Crawl started September 2012
web
eye 259,206
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Wed Nov 14 17:39:11 PST 2012 to Wed Nov 14 10:55:11 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 258,463
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-28T22:26:32 UTC to 2012-12-29T15:19:50 UTC.
Topic: crawldata
yahoo.co.jp
web
eye 258,204
favorite 0
comment 0
Source: yahoo.co.jp